Idioms in the Rosetta Machine Translation System

نویسنده

  • André Schenk
چکیده

This paper discusses one of the problems of machine trans].ation, n.m~mly the translation of idioms. The paper describes a solution to this problem within the theoretical framework of the Rosetta machine translation syst~n. Rosetta is an experimental trans] at]on system which uses an intermediate lard,mate and translates between Dutch, English and, in the future, Spanish. I ~nt roduet ion Idioms have been told still are a basic theoretical sttnlb-ling block in most linguistic theories. For the purposes of machine translation or, in genera], natural language processing , it is necessary to Ix~ able to deal with :idioms because there are so i~any of th~n in every language and because they are an essential part of it. ldioms occur in sentences as a number of words, possibly scattered over the sentence and possibly with sonde inflected el~nents; this ntnfl~er of words has to be interpreted as havip4~, one primitive meaning. For example, in (1) "nade", "peace" and "~.Rth" have to be interpreted idlomatically. Note that words that are part of the idJ~n are underlined. (1) lie has made his peace with his neighbour The classic example Is (2): (2) Pete kicked the bucket Literally this sentence means that Pete hit a specific vessel ~¢[th his foot. In the idiomatic reading the interpretation is that Pete died. It is impossib]e to infer this idiomatic meaning directly fron the prlm~tives "Pete", "kick", "the" and "bucket" and from the way they are eomblned. Idioms can undergo sy~itaetie transformations, but sometimes they are reluctant to do so. The passive sentence (3) has lost its idiomatic reading, while in the ~assive sentence (4) the idiomatic reading has heen retained. (3) The bucket was kicked by Pete (4) Mary's heart was broken by Pete Other examples are (5-12). In the idiomatic reading in (5) clefting with the object as focus is not allowed, while it is allowed in (6) if "Mary" is stressed. Clefting with the subject as focus in both (7) and (8) is permitted. In (9) the PP "at whose door" and in (I0) the NP "whose heart" can be subject to wh-movement. In (ii) the NP "Mary's heart" can be topicallzed (if "Mary" is stressed), but in (1.2) tbe NP "the bucket" cannot undergo this transformation without losing the idicanatic reading. Thus idioms behave syntactically like non-idiomatic structnres, although sometimes they are restricted. (5) It was the bucket that Pete Idcked (6) It was …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AProposed Standard for the Lexical Representation of Idioms

this paper I first explain briefly the properties ofone type ofMuIti-Word Expression (MWE), viz., flexible idioms, and how they are dealt with in the Rosetta machine translation system. Taking this as a starting point and generalizing beyond it, I argue that a standardized lexical representation for flexible idioms is not so straightforward. Nevertheless, I make a very concrete proposal for an ...

متن کامل

COMPUTATIONAL LEXICOGRAPHYAND LEXICOLOGY AProposed Standard for the Lexical Representation of Idioms

this paper I first explain briefly the properties ofone type ofMuIti-Word Expression (MWE), viz., flexible idioms, and how they are dealt with in the Rosetta machine translation system. Taking this as a starting point and generalizing beyond it, I argue that a standardized lexical representation for flexible idioms is not so straightforward. Nevertheless, I make a very concrete proposal for an ...

متن کامل

Strategies Employed in Translation of Idioms in English Subtitles of Two Persian Television Series

Translation of idioms seems to be complicated for most translators since the meaning of idioms is difficult and sometimes impossible to be deduced from the meaning of their individual components. Considering the difficulties of translation of idioms and also the specific constraints of subtitling such as space and time limits, this research studied the strategies employed in translation of idio...

متن کامل

An Empirical Study of the Impact of Idioms on Phrase Based Statistical Machine Translation of English to Brazilian-Portuguese

This paper describes an experiment to evaluate the impact of idioms on Statistical Machine Translation (SMT) process using the language pair English/BrazilianPortuguese. Our results show that on sentences containing idioms a standard SMT system achieves about half the BLEU score of the same system when applied to sentences that do not contain idioms. We also provide a short error analysis and o...

متن کامل

Identification of Idioms by Machine Translation: a Hybrid Research System vs. Three Commercial Systems

We compare three commercial Machine Translation (MT) systems, Power Translator Pro, SYSTRAN, and T1 Langenscheidt, with the research hybrid, statistical and rule-based system, METIS-II, with respect to identification of idioms. Firstly, we make a distinction between continuous (adjacent constituents) and discontinuous idioms (non-adjacent constituents). Secondly, we describe our idiom resources...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1986